news headline
LLM-Generated Negative News Headlines Dataset: Creation and Benchmarking Against Real Journalism
Babalola, Olusola, Ojokoh, Bolanle, Boyinbode, Olutayo
This research examines the potential of datasets generated by Large Language Models (LLMs) to support Natural Language Processing (NLP) tasks, aiming to overcome challenges related to data acquisition and privacy concerns associated with real-world data. Focusing on negative valence text, a critical component of sentiment analysis, we explore the use of LLM-generated synthetic news headlines as an alternative to real-world data. A specialized corpus of negative news headlines was created using tailored prompts to capture diverse negative sentiments across various societal domains. The synthetic headlines were validated by expert review and further analyzed in embedding space to assess their alignment with real-world negative news in terms of content, tone, length, and style. Key metrics such as correlation with real headlines, perplexity, coherence, and realism were evaluated. The synthetic dataset was benchmarked against two sets of real news headlines using evaluations including the Comparative Perplexity Test, Comparative Readability Test, Comparative POS Profiling, BERTScore, and Comparative Semantic Similarity. Results show the generated headlines match real headlines with the only marked divergence being in the proper noun score of the POS profile test.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Media > News (1.00)
- Government (1.00)
- Health & Medicine (0.93)
Simulating Misinformation Vulnerabilities With Agent Personas
Farr, David, Ng, Lynnette Hui Xian, Prochaska, Stephen, Cruickshank, Iain J., West, Jevin
School of Computer Science, Carnegie Mellon University, Pittsburgh, P A, USA ABSTRACT Disinformation campaigns can distort public perception and destabilize institutions. Understanding how different populations respond to information is crucial for designing effective interventions, yet real-world experimentation is impractical and ethically challenging. To address this, we develop an agent-based simulation using Large Language Models (LLMs) to model responses to misinformation. We construct agent personas spanning five professions and three mental schemas, and evaluate their reactions to news headlines. Our findings show that LLM-generated agents align closely with ground-truth labels and human predictions, supporting their use as proxies for studying information responses. We also find that mental schemas, more than professional background, influence how agents interpret misinformation. This work provides a validation of LLMs to be used as agents in an agent-based model of an information network for analyzing trust, polarization, and susceptibility to deceptive content in complex social systems. 1 INTRODUCTION Protection against foreign information campaigns and the ability to conduct effective information operations are critical to modern national security. In an era where the information domain can be leveraged as a battlefield, there is a need to maintain information advantage, defined as "the use, protection, and exploitation of information to achieve objectives more effectively than enemies and adversaries do" (U.S. Achieving and sustaining information advantage requires not only the ability to disseminate compelling narratives but also to detect, counter, and mitigate adversarial information operations.
- North America > United States (1.00)
- Europe (1.00)
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military > Army (0.89)
Learning to Interpret Weight Differences in Language Models
Goel, Avichal, Kim, Yoon, Shavit, Nir, Wang, Tony T.
Finetuning (pretrained) language models is a standard approach for updating their internal parametric knowledge and specializing them to new tasks and domains. However, the corresponding model weight changes ("weight diffs") are not generally interpretable. While inspecting the finetuning dataset can give a sense of how the model might have changed, these datasets are often not publicly available or are too large to work with directly. Towards the goal of comprehensively understanding weight diffs in natural language, we introduce Diff Interpretation Tuning (DIT), a method that trains models to describe their own finetuning-induced modifications. Our approach uses synthetic, labeled weight diffs to train a DIT-adapter, which can be applied to a compatible finetuned model to make it describe how it has changed. We demonstrate in two proof-of-concept settings (reporting hidden behaviors and summarizing finetuned knowledge) that our method enables models to describe their finetuning-induced modifications using accurate natural language descriptions.
- North America > United States (0.28)
- Europe (0.28)
- Asia (0.28)
- Media > Music (0.46)
- Leisure & Entertainment > Sports (0.46)
- Leisure & Entertainment > Games (0.46)
How News Feels: Understanding Affective Bias in Multilingual Headlines for Human-Centered Media Design
Ameen, Mohd Ruhul, Islam, Akif, Miah, Abu Saleh Musa, Siddiqua, Ayesha, Shin, Jungpil
News media often shape the public mood not only by what they report but by how they frame it. The same event can appear calm in one outlet and alarming in another, reflecting subtle emotional bias in reporting. Negative or emotionally charged headlines tend to attract more attention and spread faster, which in turn encourages outlets to frame stories in ways that provoke stronger reactions. This research explores that tendency through large-scale emotion analysis of Bengali news. Using zero-shot inference with Gemma-3 4B, we analyzed 300000 Bengali news headlines and their content to identify the dominant emotion and overall tone of each. The findings reveal a clear dominance of negative emotions, particularly anger, fear, and disappointment, and significant variation in how similar stories are emotionally portrayed across outlets. Based on these insights, we propose design ideas for a human-centered news aggregator that visualizes emotional cues and helps readers recognize hidden affective framing in daily news.
- Media > News (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.89)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Deriving Strategic Market Insights with Large Language Models: A Benchmark for Forward Counterfactual Generation
Ong, Keane, Mao, Rui, Varshney, Deeksha, Liang, Paul Pu, Cambria, Erik, Mengaldo, Gianmarco
Counterfactual reasoning typically involves considering alternatives to actual events. While often applied to understand past events, a distinct form-forward counterfactual reasoning-focuses on anticipating plausible future developments. This type of reasoning is invaluable in dynamic financial markets, where anticipating market developments can powerfully unveil potential risks and opportunities for stakeholders, guiding their decision-making. However, performing this at scale is challenging due to the cognitive demands involved, underscoring the need for automated solutions. LLMs offer promise, but remain unexplored for this application. To address this gap, we introduce a novel benchmark, FIN-FORCE-FINancial FORward Counterfactual Evaluation. By curating financial news headlines and providing structured evaluation, FIN-FORCE supports LLM based forward counterfactual generation. This paves the way for scalable and automated solutions for exploring and anticipating future market developments, thereby providing structured insights for decision-making. Through experiments on FIN-FORCE, we evaluate state-of-the-art LLMs and counterfactual generation methods, analyzing their limitations and proposing insights for future research. We release the benchmark, supplementary data and all experimental codes at the following link: https://github.com/keanepotato/fin_force
- Asia > Singapore (0.04)
- North America > Canada (0.04)
- Europe > Germany (0.04)
- (7 more...)
- Government (1.00)
- Banking & Finance > Trading (1.00)
- Banking & Finance > Economy (1.00)
Do small language models generate realistic variable-quality fake news headlines?
McCutcheon, Austin, Brogly, Chris
Small language models (SLMs) have the capability for text generation and may potentially be used to generate falsified texts online. This study evaluates 14 SLMs (1.7B-14B parameters) including LLaMA, Gemma, Phi, SmolLM, Mistral, and Granite families in generating perceived low and high quality fake news headlines when explicitly prompted, and whether they appear to be similar to real-world news headlines. Using controlled prompt engineering, 24,000 headlines were generated across low-quality and high-quality deceptive categories. Existing machine learning and deep learning-based news headline quality detectors were then applied against these SLM-generated fake news headlines. SLMs demonstrated high compliance rates with minimal ethical resistance, though there were some occasional exceptions. Headline quality detection using established DistilBERT and bagging classifier models showed that quality misclassification was common, with detection accuracies only ranging from 35.2% to 63.5%. These findings suggest the following: tested SLMs generally are compliant in generating falsified headlines, although there are slight variations in ethical restraints, and the generated headlines did not closely resemble existing primarily human-written content on the web, given the low quality classification accuracy.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Detecting Manipulated Contents Using Knowledge-Grounded Inference
Meng, Mark Huasong, Wang, Ruizhe, Xu, Meng, Yan, Chuan, Bai, Guangdong
The detection of manipulated content, a prevalent form of fake news, has been widely studied in recent years. While existing solutions have been proven effective in fact-checking and analyzing fake news based on historical events, the reliance on either intrinsic knowledge obtained during training or manually curated context hinders them from tackling zero-day manipulated content, which can only be recognized with real-time contextual information. In this work, we propose Manicod, a tool designed for detecting zero-day manipulated content. Manicod first sources contextual information about the input claim from mainstream search engines, and subsequently vectorizes the context for the large language model (LLM) through retrieval-augmented generation (RAG). The LLM-based inference can produce a "truthful" or "manipulated" decision and offer a textual explanation for the decision. To validate the effectiveness of Manicod, we also propose a dataset comprising 4270 pieces of manipulated fake news derived from 2500 recent real-world news headlines. Manicod achieves an overall F1 score of 0.856 on this dataset and outperforms existing methods by up to 1.9x in F1 score on their benchmarks on fact-checking and claim verification.
- North America > United States (1.00)
- Asia (1.00)
- Europe > United Kingdom (0.67)
- Media > News (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Did ChatGPT or Copilot use alter the style of internet news headlines? A time series regression analysis
Brogly, Chris, McElroy, Connor
The release of advanced Large Language Models (LLMs) such as ChatGPT and Copilot is changing the way text is created and may influence the content that we find on the web. This study investigated whether the release of these two popular LLMs coincided with a change in writing style in headlines and links on worldwide news websites. 175 NLP features were obtained for each text in a dataset of 451 million headlines/links. An interrupted time series analysis was applied for each of the 175 NLP features to evaluate whether there were any statistically significant sustained changes after the release dates of ChatGPT and/or Copilot. There were a total of 44 features that did not appear to have any significant sustained change after the release of ChatGPT/Copilot. A total of 91 other features did show significant change with ChatGPT and/or Copilot although significance with earlier control LLM release dates (GPT-1/2/3, Gopher) removed them from consideration. This initial analysis suggests these language models may have had a limited impact on the style of individual news headlines/links, with respect to only some NLP measures.
- North America > Canada > Ontario > Simcoe County > Orillia (0.04)
- North America > United States > Michigan (0.04)
- Research Report > Experimental Study (0.64)
- Research Report > New Finding (0.50)
KoWit-24: A Richly Annotated Dataset of Wordplay in News Headlines
Baranov, Alexander, Palatkina, Anna, Makovka, Yulia, Braslavski, Pavel
We present KoWit-24, a dataset with fine-grained annotation of wordplay in 2,700 Russian news headlines. KoWit-24 annotations include the presence of wordplay, its type, wordplay anchors, and words/phrases the wordplay refers to. Unlike the majority of existing humor collections of canned jokes, KoWit-24 provides wordplay contexts -- each headline is accompanied by the news lead and summary. The most common type of wordplay in the dataset is the transformation of collocations, idioms, and named entities -- the mechanism that has been underrepresented in previous humor datasets. Our experiments with five LLMs show that there is ample room for improvement in wordplay detection and interpretation tasks. The dataset and evaluation scripts are available at https://github.com/Humor-Research/KoWit-24
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (11 more...)
Contrastive Similarity Learning for Market Forecasting: The ContraSim Framework
Vinden, Nicholas, Saqur, Raeid, Zhu, Zining, Rudzicz, Frank
We introduce the Contrastive Similarity Space Embedding Algorithm (ContraSim), a novel framework for uncovering the global semantic relationships between daily financial headlines and market movements. ContraSim operates in two key stages: (I) Weighted Headline Augmentation, which generates augmented financial headlines along with a semantic fine-grained similarity score, and (II) Weighted Self-Supervised Contrastive Learning (WSSCL), an extended version of classical self-supervised contrastive learning that uses the similarity metric to create a refined weighted embedding space. This embedding space clusters semantically similar headlines together, facilitating deeper market insights. Empirical results demonstrate that integrating ContraSim features into financial forecasting tasks improves classification accuracy from WSJ headlines by 7%. Moreover, leveraging an information density analysis, we find that the similarity spaces constructed by ContraSim intrinsically cluster days with homogeneous market movement directions, indicating that ContraSim captures market dynamics independent of ground truth labels. Additionally, ContraSim enables the identification of historical news days that closely resemble the headlines of the current day, providing analysts with actionable insights to predict market trends by referencing analogous past events.
- Asia (0.15)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Oregon (0.14)
- Banking & Finance > Trading (1.00)
- Energy > Oil & Gas > Upstream (0.47)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)